Pronunciation lexicon adaptation for TTS voice building
نویسندگان
چکیده
This paper describes reducing phone label errors in TTS voice building by means of modeling of speaker pronunciation variants. Each speaker has his or her own unique pronunciations (and context-dependent variations), so that no one standard lexicon is able to cover all of the speaker’s variations. Creating speaker-dependent pronunciation lexicons for automatic speech labeling of our TTS voice databases helped to eliminate many pronunciation errors that resulted from mismatches between lexical pronunciations and how the speaker (voice talent) actually pronounced a word. We also found that it contributed other synthesis quality improvement as well. A perceptual test showed that our work contributed to MOS improvement for American English male and female voices.
منابع مشابه
A Generative Model of a Pronunciation Lexicon for Hindi
Voice browser applications in Text-toSpeech (TTS) and Automatic Speech Recognition (ASR) systems crucially depend on a pronunciation lexicon. The present paper describes the model of pronunciation lexicon of Hindi developed to automatically generate the output forms of Hindi at two levels, the and the (PS, in short for Prosodic Structure). The latter level involves both syllable-...
متن کاملTTS From Zero Building Synthetic Voices for New Languages
A developer wanting to create a speech synthesizer in a new voice for an under-resourced language faces hard problems. These include difficult decisions in defining a phoneme set and a laborious process of accumulating a pronunciation lexicon. Previously this has been handled through involvement of a language technologies expert. By definition, experts are in short supply. The goal of this thes...
متن کاملSpeaker adaptation using a parallel phone set pronunciation dictionary for Thai-English bilingual TTS
This paper develops a bilingual Thai-English TTS system from two monolingual HMM-based TTS systems. An English Nagoya HMM-based TTS system (HTS) provides correct pronunciations of English words but the voice is different from the voice in a Thai HTS system. We apply a CSMAPLR adaptation technique to make the English voice sounds more similar to the Thai voice. To overcome a phone mapping proble...
متن کاملP R O N U N C I at I O N M O D E L I N G F
Natural and intelligible Text to Speech (TTS) systems exist for a number of languages in the world today. However, there are many languages of the world, for which building TTS systems is still prohibitive, due to the lack of linguistic resources and data. Some of these languages are spoken by a large population of the world. Others are primarily spoken languages, or languages with large non-li...
متن کاملImproving TTS by higher agreement between predicted versus observed pronunciations
This paper looks at improving unit selection text-to-speech (TTS) quality by optimizing the agreement between frontend and speech database. We focused, in particular, on two classes of problems causing degradation in synthesis quality: 1) realization of /d/ and /t/1 sounds and 2) confusions of unstressed vowels, especially with schwas. We investigated two approaches to tackling these problems. ...
متن کامل